A machine learning approach for differentiating bipolar disorder type II and borderline personality disorder using electroencephalography and cognitive abnormalities

This study addresses the challenge of differentiating between bipolar disorder II (BD II) and borderline personality disorder (BPD), which is complicated by overlapping symptoms. To overcome this, a multimodal machine learning approach was employed, incorporating both electroencephalography (EEG) patterns and cognitive abnormalities for enhanced classification. Data were collected from 45 participants, including 20 with BD II and 25 with BPD. Analysis involved utilizing EEG signals and cognitive tests, specifically the Wisconsin Card Sorting Test and Integrated Cognitive Assessment. The k-nearest neighbors (KNN) algorithm achieved a balanced accuracy of 93%, with EEG features proving to be crucial, while cognitive features had a lesser impact. Despite the strengths, such as diverse model usage, it’s important to note limitations, including a small sample size and reliance on DSM diagnoses. The study suggests that future research should explore multimodal data integration and employ advanced techniques to improve classification accuracy and gain a better understanding of the neurobiological distinctions between BD II and BPD.


Introduction
Borderline personality disorder (BPD) is characterized by hypersensitivity to rejection, resulting in instability in interpersonal relationships, self-image and behavior [1].BPD has a prevalence of 2.7% in the general population [2].On the other hand, BD involves recurrent mood episodes that range from depression to mania (BD I) or hypomania (BD II) [3].BD affects 2% of the general population [4].While both disorders cause significant impairment in the daily life of the affected individuals, the underlying mechanisms and treatment approaches differ.BPD is primarily treated with psychotherapy, whereas BD often requires a combination of medication and therapeutic interventions to manage mood fluctuations.As these two disorders significantly overlap in their features, accurately differentiating the two disorders has always been a diagnostic challenge.The initial diagnosis traditionally relies on a combination of comprehensive history taking and clinical symptoms, and there are currently no specific paraclinical tests available for definitively diagnosing these disorders [1][2][3][4][5][6].
By assessing the functional integrity of the brain, electroencephalography (EEG) may reveal potential distinctions between BPD and BD.However, no conclusive evidence exists for whether the two disorders can be differentiated by EEG features [5,[7][8][9][10][11][12].Nonetheless, studies have reported specific EEG findings in patients with BPD, such as intermittent rhythmic delta and theta activity observed during severe dissociative states characterized by inner tension and auto aggressive behaviour [13].Additionally, the presence of slow-wave activity and dysrhythmia has been documented in some BPD patients [14].A correlation between positive spikes and heightened impulsivity has been identified [13,15].While these EEG observations provide intriguing insights, further research is necessary to establish their diagnostic utility.
Cognitive impairment is a prevalent feature observed in both BD II and BPD, significantly impacting crucial cognitive functions such as attention, memory, and executive function [16,17].These impairments can disrupt patients' daily lives, necessitating targeted interventions to address their specific cognitive challenges.Notably, cognitive impairment in these disorders' manifests in various ways, including impulsivity, emotional dysregulation, impaired social cognition, problem-solving, decision-making impairments, processing speed reduction, and visuospatial processing deficits [18,19].It has been reported that patients with BD II or BPD exhibit poor performance across multiple neurocognitive domains, similar in cognitive flexibility and set-shifting, decision-making, sustained and selective attention, and problemsolving [20].Furthermore, it has been observed that patients with BPD tend to display more pronounced inhibition deficits and exhibit poorer performance in planning and attentional bias tasks when compared to individuals with BD II [20][21][22].
A growing body of research indicates the potential of machine learning techniques in distinguishing between BD II and BPD, holding promise in improving diagnostic accuracy and treatment outcomes.However, the number of studies on this topic remains limited [23][24][25].Machine learning algorithms can extract patterns and identify the variations between the two conditions by leveraging large datasets comprising clinical profiles, genetic markers, and neuroimaging data.However, continued research and validation are essential to ensure the reliability and generalizability of these models.Therefore, we aimed to evaluate the application of machine learning for differentiating BD II and BPD based on cognitive abnormalities and EEG features.We included 45 participants, aged between 18 and 50 years, diagnosed with either BD II or BPD, based on the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) criteria [26].In order to avoid bias in the cognitive assessment of the patients, exclusion criteria were life-threatening psychiatric conditions (e.g., suicidal thoughts), any other comorbid psychiatric disorders above the diagnostic threshold (e.g., schizophrenia), intellectual disability (based on clinical judgment), comorbid severe medical conditions (e.g., neurological disorders), and history of head trauma and brain injury, and history of neurosurgery.
The noises resulting from the city electricity, blinking and muscle movements were initially removed using the Matlab's EEGlab toolbox and its IClabel and MARA plugins [32].Data was filtered within the frequency range of 0.5 Hz to 32.5 Hz.Furthermore, to address other potential artifacts, independent component analysis (ICA) was employed.

Cognitive tests.
WCST, first used by Grant and Berg in 1948, evaluates perseveration vs. flexibility, working memory, and abstraction, executive function (frontal lobe dysfunction) [27].The Persian version of the test was used in the study [28].The standard WSCT consists of 128 cards (two sets of 64 cards) different in shape (triangle, cross, circle, and star), color (green, blue, red, and yellow), and number (one, two, three, four).The participant is asked to sort the cards based on a pattern they find.However, the sorting pattern changes throughout the test, and the participant must figure out the new pattern through trial and error.After each answer, they receive feedback as 'correct' or 'incorrect.' WCST variables include total errors, total correct responses, perseverative errors (participant starts the test with an initial incorrect guess and answers based on that guess or the number of times the participant persisted in using a previously successful sorting rule even though it was no longer correct), non-perseverative errors, categories completed, conceptual level response (the number of times the participant shifts to a new sorting rule without feedback from the examiner), trials to first complete category, learning to learn, and failure to maintain set.In addition, the time test is measured [27,28,32,33].
ICA test is a rapid cognitive assessment tool based on humans' strong responses to animal stimuli.It consists of a rapid categorization task designed to evaluate the function of the higher-level visual cortex.In the learning phase, participants are presented with ten animal images [29].The Persian version of the test was used in the study [30].If participants perform above the chance level (greater than 50% accuracy), they proceed to the main task.However, if their performance falls below 50%, the test instructions are reiterated, and a new set of ten training pictures is presented.They will progress to the main task if they perform above chance on this second attempt.The test will be aborted if they perform below chance for the second time.Ultimately, the first ten images are later removed from further analysis.Overall, one hundred natural images (50 animal and 50 non-animal) with various difficulty levels are presented to the participant, each for 100 ms, followed by a 20 ms interstimulus interval and a dynamic, noisy mask for 250 ms.Variables consist of accuracy (of categorization), speed (participant's reaction time in trials they have responded correctly), and ICA index (incorporating accuracy and speed's raw test results) [29,30,34,35].

Feature extraction.
The data extraction process involved the utilization of statistical, spectral, and wavelet features, which were enhanced through the Synthetic Minority Oversampling Technique (SMOTE) method [36].Subsequently, classification was performed using the K-Nearest Neighbor (KNN) classifier.MATLAB software was employed for the extraction of statistical characteristics and spectral features from the EEG signals, as well as for conducting wavelet analysis.KNN classification is often considered a favorable option for classifying EEG data due to its simplicity, ease of implementation, and ability to capture complex patterns in high-dimensional feature spaces.EEG signals, which represent the electrical activity of the brain, exhibit intricate temporal and spatial patterns that may not be easily modeled by more rigid algorithms.KNN's non-parametric nature allows it to adapt to the varying and nuanced nature of EEG data without making strong assumptions about the underlying distribution.Additionally, KNN excels in handling noisy data and can effectively discern subtle differences in EEG patterns, making it a versatile and reliable choice for EEG classification tasks where interpretability and adaptability are crucial [37].
Wavelet analysis was employed to capture additional information about the EEG signals.This involved using both the continuous wavelet transform (CWT) and the discrete wavelet transform (DWT).The CWT entails convolving the EEG signal with a continuous mother wavelet that is scaled and shifted.In contrast, the DWT is obtained through a filter bank approach, where the signal is passed through a series of high-pass and low-pass filters to decompose it into approximation and detail coefficients at different resolution levels.
2.2.4.Feature selection.In the context of dealing with high-dimensional data, feature selection has become an essential component of the learning process.Proper feature selection can lead to improvements in learning speed, generalization capacity, and the simplicity of the inferred model.For this study, we employed a feature selection method known as Univariate Feature Selection [38].The Univariate Feature Selection method calculates the ANOVA Fvalue for each feature about the target vector.This statistical measure helps identify the most critical features that exhibit significant relationships with the target variable.These selected features can provide valuable information for the classification task.
The utilization of the Univariate Feature Selection method in this study is justified by its effectiveness in handling high-dimensional data and identifying the most relevant features for the classification task [39].This method calculates the ANOVA F-value for each feature with respect to the target variable, enabling the selection of features that exhibit significant relationships.By employing univariate feature selection, the study aims to streamline the feature set, reduce dimensionality, and enhance the interpretability and generalization capacity of the machine learning model [40].This approach is particularly crucial in the context of EEG signal analysis and cognitive tests, where a multitude of features are extracted, ensuring that only the most discriminative and informative features contribute to the classification of bipolar disorder type II (BD II) and borderline personality disorder (BPD).The choice of Univariate Feature Selection aligns with the technical goal of improving learning speed, model simplicity, and generalization performance [39,40].
In our analysis, we utilized the scikit-learn 1.2.1 library for performing the feature selection process.This widely-used library provides a comprehensive set of tools and functions for machine learning tasks, including feature selection techniques.
Our dataset consisted of a total of 318 features, including 84 spectral features, 84 statistical features, 126 wavelet features, 12 features derived from WCST, and 5 features from the ICA test.Each type of feature provided unique insights into the characteristics and patterns present in the data.
However, after applying the feature selection process, the number of features was significantly reduced to 11.This reduction in feature dimensionality aimed to retain only the most informative and relevant features for the classification task.By selecting these 11 features, we aimed to streamline the dataset and focus on the most discriminative attributes that contribute significantly to the classification of BPD and BD.
2.2.5.Data augmentation.Data augmentation is a crucial technique in data science that involves increasing the size and diversity of a dataset.By generating new samples from existing data, data augmentation aims to enhance the performance and generalizability of machine learning models.Having a larger and more comprehensive dataset can help mitigate issues such as overfitting and improve the model's ability to capture underlying patterns.
In this study, SMOTE and imbalanced-learn 0.10.1 library for data augmentation, to address imbalanced datasets, where the number of samples in different classes is uneven.By oversampling the minority class and synthesizing new samples, SMOTE helps to balance the dataset and prevent.
The F1 score and balanced accuracy are particularly valuable when dealing with imbalanced datasets [41][42][43].The F1 score, which combines precision and recall, provides a balanced measure that accounts for both false positives and false negatives.It offers a more comprehensive assessment of the model's ability to correctly classify both classes, giving equal importance to both precision and recall [44][45][46][47][48].
Balanced accuracy is another metric that is commonly used to evaluate classification models, particularly in imbalanced datasets.It is calculated as the average of sensitivity (true positive rate) and specificity (true negative rate) [49].By considering both the model's ability to correctly identify positive instances and its ability to correctly identify negative instances, balanced accuracy provides a more equitable evaluation of the model's performance.It gives equal weight to both classes and helps in assessing the model's overall effectiveness in classifying instances from both classes.

Demographics
A total of 45 participants were included in the study: 25 with BPD and 20 with BD.The BPD group was female dominant (N = 21, 84%) and the BD group was male dominant (N = 13, 65%).Baseline characteristics of the participants are presented in Table 1

Prediction accuracy
Distribution of results highlights the effectiveness of KNN in handling the dataset and the potential challenges faced by other algorithms, suggesting that the choice of algorithm is critical for accurate classification in this context.Table 2 in the study evaluates the performance of various classification algorithms used to differentiate between bipolar disorder type II (BD II) and borderline personality disorder (BPD).The table provides metrics for accuracy, balanced accuracy, and F1 score for each algorithm.The K-Nearest Neighbors (KNN) algorithm demonstrated the highest performance, achieving an accuracy of 89%, a balanced accuracy of 93%, and an F1 score of 90.This indicates that KNN was particularly effective at accurately classifying the data.Algorithms such as Label Propagation, Linear Discriminant Analysis, Extra Trees, and Label Spreading showed identical performance with 78% accuracy, 86% balanced accuracy, and an F1 score of 80.These algorithms performed well but were less effective than KNN.Support Vector Machine (SVM) and the Passive Aggressive Classifier displayed moderate performance, with SVM achieving 67% accuracy, 79% balanced accuracy, and an F1 score of 69, while the Passive Aggressive Classifier had 78% accuracy, 68% balanced accuracy, and an F1 score of 78.The Calibrated Classifier had lower performance with 56% accuracy, 71% balanced accuracy, and an F1 score of 58.Other algorithms like Nearest Centroid, Bagging Classifier, Ridge Classifier, and Random Forest had similar moderate performance metrics with 67% accuracy, 61% balanced accuracy, and an F1 score of 69.Several algorithms, including Logistic Regression, XGBClassifier, and Linear SVM, showed lower performance with 56% accuracy, 54% balanced accuracy, and an F1 score of 59.The Quadratic Discriminant Analysis and AdaBoost had particularly low performance, with accuracy of around 44%, balanced accuracy of 46%, and F1 scores below 50.The Decision Tree and Stochastic Gradient Descent classifiers performed the worst, with accuracies of 33% and 44%, respectively, and the lowest balanced accuracies and F1 scores among all algorithms.The table highlights that the KNN algorithm is the most effective for this classification task, while several other algorithms show varying degrees of effectiveness.The choice of algorithm significantly impacts the classification accuracy, indicating the importance of selecting an appropriate model for differentiating between BD II and BPD using EEG and cognitive data.

WCST.
No significant differences were found on the variables of the WCST between the groups.In addition, education level was not significantly associated with any of the variables of WCST (Table 4).

ICA test.
The ICA index was significantly different between the two groups (P = 0.001) (Fig 2).The difference of other variables between the groups was non-significant (Table 5).

Discussion
Our findings indicated that the KNN algorithm had a high balanced accuracy.In the classification process, EEG signals were identified as significant features, and cognitive features were given less weight.

Literature review and comparison with previous studies
4.1.1.EEG-based diagnoses.Numerous studies have delved into the classification and diagnosis of mental disorders and neurological conditions through the utilization of EEG [34,[50][51][52][53][54][55].In a study on BD, a machine learning approach based on EEG signals and XGB demonstrated remarkable performance, achieving a high prediction accuracy of 94%, precision exceeding 94%, and recall surpassing 94% [34].In the realm of epilepsy, EEG-based methodologies have shown promise in seizure detection [50].One study proposed a real-time EEG-based approach utilizing discrete wavelet transform, attaining an accuracy of 97% and a sensitivity of 96.67% in the UB dataset.In the CHB-MIT dataset, the method achieved an accuracy of 96.38%, a sensitivity of 96.15%, and a low false positive rate of 3.24% [51].Another study focused on severe psychiatric disorders detection using EEG signals.The machine learning model incorporated Quantum Local Binary Pattern (QLBP) functions and wavelet packet decomposition, achieving high accuracy rates of 97.47%, 94.36%, and 93.49% for detecting intellectual disability, schizophrenia spectrum disorders, and depressive disorders, respectively [52].In the field of neurological disorders, an automatic seizure detection method was proposed, employing signal decomposition representations, feature extraction using discrete wavelet decomposition, and machine learning techniques.The classification accuracy reached up to 100% using Support Vector Machine (SVM), KNN, and Linear Discriminant Analysis (LDA) [53].Additionally, another study presented a method for automatically diagnosing epileptic seizures using EEG signals, utilizing data mining and machine learning techniques such as discrete wavelet transform and ANOVA-based feature ranking.The method, employing Least Squared SVM (LS-SVM), KNN, and Naive Bayes (NB), achieved an average accuracy of 99.5%, a sensitivity of 99.01%, and a specificity of 100%, proving effective in diagnosing epileptic seizures [54].Recently, a study introduced a Radial Basis Function Neural Network (RBFNN) for classifying EEG signals related to epileptic seizures, utilizing discrete wavelet decomposition as the feature extraction method.The proposed method, optimized using a modified Particle Swarm Optimization (PSO) algorithm, outperformed other techniques, reaching a maximum accuracy of 99% [55].
Finally, a machine learning framework was developed for diagnosing Major Depressive Disorder (MDD) using EEG signals.The framework integrated various feature extraction methods employing discrete wavelet decomposition and the Sequential Backward Floating Search (SBFS) algorithm.This method achieved impressive results, with an average accuracy of 99%, a sensitivity of 98.4%, a specificity of 99.6%, an F1 score of 98.9%, and an insignificant false discovery rate of 0.4%, suggesting its potential as a diagnostic tool for MDD [34].
The application of machine learning methods, particularly those involving EEG signals, in distinguishing psychiatric disorders has yielded varied outcomes in prior research.For example, Arikan et al. (2019) examined resting EEG recordings of healthy individuals, BD II patients, and BPD patients and found no significant differences between the two clinical groups, suggesting biological similarity between BD II and BPD [13].However, our study's success in achieving a high accuracy rate indicates that distinctive EEG patterns can indeed be identified with the application of appropriate analytical techniques.
Comparing our results to studies by Bayes et al. in 2021 and 2022, we observed a substantial enhancement in classification accuracy.While their studies reported accuracy rates ranging from 73.1% to 73.9%, our model achieved an accuracy of 93% [21,23].This notable improvement can be attributed to the incorporation of EEG signals, providing deeper insights into the neurological aspects of these disorders.Additionally, the use of a diverse set of machine learning techniques, such as KNN, SVM, Decision Trees (DT), and XGB, as discussed by Baker et al. in 2023, plays a crucial role in achieving accurate predictions.It is noteworthy that our study outperformed the XGB algorithm reported by Baker et al., indicating that the KNN algorithm we employed is particularly well-suited for this specific classification task [56].

Cognitive tests in mental disorder diagnosis.
Recent research has highlighted the significance of self-awareness in the treatment of psychiatric disorders, such as schizophrenia, BD and BPD [57][58][59][60].One Study (Martin, 2023) emphasizes incorporating clinical and cognitive measures in psychotherapy [57].Identifying key pathological personality traits and/or symptoms associated with psychotic features in BPD and BD, a study found shared predictors like detachment, negative affectivity, psychoticism, depressiveness, grandiosity, suspiciousness, and interpersonal sensitivity symptoms.Paranoid ideation stood out in BPD.The study suggests an overlap between BPD and schizoaffective/psychosis spectra [58] .investigated attentional bias in patients with BD and BPD.Patients with BD II exhibited higher attentional bias scores than those with BPD and controls.This approach sheds light on cognitive differences distinguishing the two disorders [25].
The contrasting outcomes between cognitive features and EEG signals in our study warrant further exploration.Our findings indicated that cognitive features were not as influential as EEG signals in distinguishing between BD II and BPD, as the applied feature selection methods removed cognitive features while retaining EEG features.This result is consistent with the argument that cognitive features alone might not be sufficient to differentiate between these two disorders effectively.Rather, the underlying neurological patterns captured through EEG provide more discriminative information.This underscores the importance of leveraging neurobiological data to enhance the accuracy of diagnostic differentiation, aligning with the assertion made by that mood prediction was most accurate when considering interrelationships between different mood elements captured through signature-based learning [24].

Strengths and limitations
Integration of multiple machine learning algorithms enhances the reliability of the classification system, minimizing overreliance on a single approach.However, our study does have certain limitations that should be acknowledged.The most prominent limitation is the small sample size, and not including patients with BD I, which might affect the generalizability of our findings.Additionally, patients with comorbid of BD II and BPD were not included.Moreover, our study was reliant on interviews based on DSM-5 criteria, and therefore machine learning approaches still remain a preliminary step in separating the disorders until objective biomarkers are identified.

Implications for policy, practice and future research
Future research should focus on incorporating additional data sources, such as genetic and neuroimaging data, to improve diagnostic accuracy.Furthermore, the integration of deep learning and other advanced machine learning techniques could offer additional improvements in classification.

Conclusions
We found that KNN algorithm had a high balanced accuracy, and machine learning method is a promising tool in differentiating BD II and BPD based on EEG signalling and ICA test, and not WCST.Further research is needed to strengthen the body of evidence on this matter.
This cross-sectional study was conducted in the Brain and Cognition Clinic (affiliated with Institute for Cognitive Sciences Studies and Iran University of Medical Sciences, Tehran, Iran) from June 2022 to March 2023.It was approved by the Ethics Committee of the Iran University of Medical Sciences Institutional Review Board (IR.IUMS.REC.1401.129)and carried out based on the Declaration of Helsinki and subsequent revisions.Written informed consent was obtained from all participants, and their data was used anonymously.

2 . 2 . 6 .
Classification techniques.A range of classification algorithms such as Support Vector Machines (SVM), KNN, Random Forests (RF), Neural Networks (NN), and etc. were evaluated for accuracy (ACC), balanced accuracy and F1 Score.The validity of the classification algorithms were assessed by the following formulas: TP ¼ true positive ðthe correctly predicted positive class outcome of the modelÞ; TN ¼ true negative ðthe correctly predicted negative class outcome of the modelÞ; FP ¼ false positive ðthe incorrectly predicted positive class outcome of the modelÞ; FN ¼ false negative ðthe incorrectly predicted negative class outcome of the modelÞ: